09/24/2002 08:30 973467658^ 



HENRY BRENDZEL 



Beutnagel 3-12-9 



IN THE UNITED STATES 
PATENT AND TRADEMARK OFFICE 



Patent Application 

Inventors) 



Mark Beutnagel 
Joern Ostermann 
Schuyler Quackenbusch 
Filing Date 1/27/1999 
Examiner Michael Opsasnick 

Title Advanced TTS for Facial Animation 



Case Name 



Serial No. 
Group Art Unit 



Beutnagel 3-12-9 



09/238,224 
2645 



ASSISTANT COMMISSIONER FOR PATENTS 

WASHINGTON, D.C 20231 
SIR: 

AMENDMENT REMARKS 
This is in response to an Office action dated May 3, 2002. 

Claims 1-5, 7, 10, 13-22 were rejected under 35 USC 103 as being unpatentable over 
of Lee, US Patent 6,088,673 in view of Campbell, US Patent 6,366,883. Applicants 
respectfully traverse. 

Before proceeding substantively, applicants apologize for a slight error in the 
previous Office action response. Claim 1 was copied from the marked-up version to form a 
clean version but, through oversight, the deleted word was not actually deleted, and the 
added words were added with an underlining. The following is the cojnrect clean copy of 
claim 1 . 

***** 



1. A method for generating a signal rich in prosody information comprising the steps 



of: 



including in said signal a plurality of phonemes represented by phoneme symbols, 
including in said signal a duration specification associated with each of said 
phonemes, 

including, for at least one of said phonemes, at least two prosody parameter 
specifications, with each specification of a prosody parameter specifying a target value for 
said prosody parameter and any selected point in time for Teaching said target value. 



* * * * * 
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Discussion of the Lee reference is found in the previous Office action response, and 
it is hereby incorporated by reference. The conclusion reached by applicants after a 
thorough analysis of the references was that because the last clause of claim 1 is neither 
described nor suggested by Lee, that claim 1 is its entirety is not anticipated or rendered 
obvious by Lee. The Examiner apparently does not controvert this conclusion, because in 
the instant rejection the Examiner combines the Campbell reference with the Lee reference. 

Campbell describes a synthesis system that effectively comprises two sections. The 
first section analyzes a corpus of stored signals, creates indexes and weighting factors, and 
forms a corpus of data. The analysis process occurs only once. The second section 
accessed this corpus of data, and in response to an input phoneme sequence synthesizes a 
speech segment (e,g., a sentence). Thus, the synthesis process can occur many times. 

With reference to FIG. 1 of Campbell and the text at col. 5, lines 55-65, it is clear 
that the speech analyzer 10 and the weighting coefficient training controller 1 1 are in the 
first section, since the processes performed by them are performed only once, and that the 
speech unit selector 12 and speech synthesizer 13 are clearly in the second section, since the 
processes they execute are executed each time a new input phoneme sequence is presented. 

The details of the speech synthesis are described in the Campbell reference starting 
at col 10, line 55, and ending with col. 15, line 50. 

A flow chart of the process carried out by speech analyzer 10 is described starting at 
col. 15, line 54, and ending at col. 16, line 33. Indeed, FIG. 4 is titled "Speech Analysis 
Process."* 

A flowchart of the process carried out by weighting coefficient training controller 11, 
spanning FIGS, 5 and 6, is described starting at col. 16, line 34, and ending at col, 17, line 
29. 

The Examiner cites col. 16, line 14 to col 17 line 23, which spans a portion of the 
description relating to analyzer 10 (col 16, lines 14-33), and the description relating to 
element 1 1 (col 16, line 34 - col 17, line 23). In other words, the text cited by the 
Examiner relates to teachings of the first section of the synthesis system, that which relates 
to once-only analysis of the data from which (ultimately) a speech segment can be 
synthesized. It is not related to, and it teaches nothing regarding, the synthesis portion that 
occurs each time a phoneme input sequence is presented. Stated in other words, the 

2 
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teachings cited by the Examiner are not related to the collection of steps for synthesizing 
speech; which is precisely to what claim 1 is directed: the inventive collection of steps for 
synthesizing speech. Therefore, the teachings cited by the Examiner could not possibly 
teach anything that can be combined with any other reference (Lee included) to suggest, or 
to realize, a different synthesis process. 

Furthermore, the Examiner asserts that Campbell teaches "a selected point in time 
for teaching the target value," (offering the above-cited text in col. 16, line 14 through col 
17, line 23 in support), but applicants respectfully disagree that the reference, generally, and 
the cited text in particular, teaches that which the Examiner asserts it teaches. 

The first portion of the cited text, which relates to the analysis process that is carried 
out in analyzer 10 , mentions nothing that is related to reaching anything, and certainly not 
reaching a target at some specified time. If the Examiner disagrees, applicants respectfully 
request a more focused reference to lines where said a teaching occurs. 

The second portion of the cited text, which relates to the analysis process that is 
carried out in controller 1 1 , also mentions nothing that is related to reaching a target at some 
specified time. However, the word "target" is found in this second portion, so it bears some 
explaining. The word '"target" is applied with reference to the particular phoneme that it is 
desired to have in the corpus of data, which might not be there, and which, therefore, the 
Campbell reference selects some other phoneme that is considered to be the closest match. 
The corpus of data has many of phonemes, and some are closer to the target phoneme than 
are others, and in order to identify the closest match to the target phoneme, the FIGS. 5 and 
6 process calculates the Euclidean cepstral distances between the phonemes that are 
considered, and the target phoneme. A weighting coefficient is then calculated and stored. 

It is respectfully submitted that this use of the term "target" has nothing to do with 
the process of synthesis, has nothing to do with a target value of anything, and it has nothing 
to do with reaching the target value at any particular time. It certainly has nothing to do 
with a prosody parameter value reaching a target level at a particular point in time. 

Therefore, it is believed that adding the teachings of Campbell to the teachings of 
Lee fails to yield or suggest the method specified in claim 1 and, therefore, it is respectfully 
submitted that claim 1 is not obvious in view of the Lee-Campbell combination of 
references. 
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The same argument applies to independent claim 21. 

In light of the above remarks, it is respectfully submitted that independent claims 1 
and 21 , and consequently the remaining dependent claims, are not obvious in view of the 
Lee reference combined with the Campbell reference. Reconsideration and allowance of all 
claims are, therefore, respectfully solicited. 



Respectfully, 
Mark Beutnagel 
Joern Ostermann 
Schuyler Quackenbusch 




Dated; 

He&y T.lJrendzel 
Reg. No. 26,844 
Phone (973)467-2025 
Fax (973)467-6589 
email brendzel@home.com 
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